A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency

نویسندگان

  • Sakriani Sakti
  • Konstantin Markov
  • Satoshi Nakamura
چکیده

The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from the data sparsity problem and memory constraints. Therefore, an efficient modeling of wider contexts needs to be addressed to achieve a realistic application for an automatic speech recognition system. This paper presents a new method of modeling pentaphone-context units using the hybrid HMM/BN acoustic modeling framework. Rather than modeling pentaphones explicitly, in this approach the probabilistic dependencies between the triphone context unit and the second preceding/following contexts are incorporated into the triphone state output distributions by means of the BN. The advantages of this approach are that we are able to extend the modeled phonetic context within the triphone framework, and we can use a standard decoding system by assuming the next preceding/following context variables hidden during the recognition. To handle the increased parameter number, tying using knowledge-based phoneme classes and a data-driven clustering method is applied. The evaluation experiments indicate that the proposed model outperforms the standard HMM based triphone model, achieving a 9–10% relative word error rate (WER) reduction. key words: wide phonetic context model, pentaphone, HMM/BN acoustic model

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Hybrid HMM/BN Acoustic Models: Design and Implementation Issues

In recent years, the number of studies investigating new directions in speech modeling that goes beyond the conventional HMM has increased considerably. One promising approach is to use Bayesian Networks (BN) as speech models. Full recognition systems based on Dynamic BN as well as acoustic models using BN have been proposed lately. Our group at ATR has been developing a hybrid HMM/BN model, wh...

متن کامل

A Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition

In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Bayesian Networks (BN) allow for easy combination of different continuous as well as discrete features by exploring conditional dependencies between them. However, the lack of efficient algorit...

متن کامل

Advanced Acoustic Modeling with the Hybrid HMM/BN Framework

Most of the current state-of-the-art speech recognition systems are based on HMMs which usually use mixture of Gaussian functions as state probability distribution model. It is a common practice to use EM algorithm for Gaussian mixture parameter learning. In this case, the learning is done in a ”blind”, data-driven way without taking into account how the speech signal has been produced and whic...

متن کامل

Hybrid HMM/BN LVCSR system integrating multiple acoustic features

In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Dynamic Bayesian Networks (DBN) allow for easy combination of different features and make use of conditional dependencies between them. However, lack of efficient algorithms has prevented their...

متن کامل

Forward-backwards training of hybrid HMM/BN acoustic models

In this paper, we describe an application of the Forward-Backwards (F-B) algorithm for maximum likelihood training of hybrid HMM/Bayesian Network (BN) acoustic models. Previously, HMM/BN parameter estimation was based on a Viterbi training algorithm that requires two passes over the training data: one for BN learning and one for updating HMM transition probabilities. In this work, we first anal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEICE Transactions

دوره 89-D  شماره 

صفحات  -

تاریخ انتشار 2006